IBIS Macromodel Task Group

Meeting date: 09 December 2014

Members (asterisk for those attending):
Altera:                       David Banas
ANSYS:                      * Dan Dvorscak
                            * Curtis Clark
Avago (LSI)                   Xingdong Dai
Cadence Design Systems:     * Ambrish Varma
                              Brad Brim
                              Kumar Keshavan
                              Ken Willis
Ericsson:                     Anders Ekholm
IBM                           Steve Parker
Intel:                      * Michael Mirmak
Keysight Technologies:        Fangyi Rao
                            * Radek Biernacki
Maxim Integrated Products:    Hassan Rafat
Mentor Graphics:            * John Angulo
                            * Arpad Muranyi
Micron Technology:          * Randy Wolff
                              Justin Butterfield
QLogic Corp.                  James Zhou
                              Andy Joy
eASIC                         Marc Kowalski
SiSoft:                     * Walter Katz
                            * Todd Westerhoff
                            * Mike LaBonte
Synopsys                      Rita Horner
Teraspeed Consulting Group:   Scott McMorrow
Teraspeed Labs:             * Bob Ross

(Note: Agilent has changed to Keysight)

The meeting was led by Arpad Muranyi.

------------------------------------------------------------------------
Opens:

- Arpad reviewed our upcoming meeting schedule.
  - This is our last meeting of the year. [no objections]


--------------------------
Call for patent disclosure:

- None


-------------
Review of ARs:

- Walter send updated C_comp Model BIRD draft to Mike for posting.
  - Done

- Todd produce slides for co-optimization requirements discussion.
  - Presenting today.

- Arpad to review IBIS spec for min max issues.
  - In progress.


-------------
New Discussion:

Co-optimization:

- Arpad - Motion to untable Back-channel discussion.
- Todd - Second. [no one opposed]

- Todd showed "IBIS-AMI and Co-optimization" presentation

- Todd - Today we want to talk about:
  - What do we understand the requirements to be?
  - What is the problem to solve?
  - What are the required elements of the solution?
 
- Todd - We will gloss over many details.
  - Lay it out from requirements down to high level flow.
  - Changes to .ami and .dlls.
  - More important to make sure we all understand what we are presenting.
    - Happy to take questions along the way.
  - I went to Wikipedia to see if the words we used are defined.
    - optimization - you have a numerical metric you can maximize or minimize.
    - Co-optimization, link training, back channel - all comical definitions.
    - These three terms don't have a standard definition we need to follow.
  - Problem we are trying to solve?
    - What are requirements for a potential solution?
  - Simulation models for devices that have a hardware runtime back channel.
    - Co-optimize Tx and Rx at runtime.
    - Cadence's original proposal.
    - Implication here is that models try to follow the hardware optimization
      protocol as closely as possible.
    - We recognize that whenever we optimize a system each decision affects the
      subsequent decisions (search path) and the final landing point.
    - We may have local minima and maxima.
    - Locally optimal solution because we are trying to model the exact search
      path, search path matters.
  - Designer's questions:
    - Will the link converge?
    - Do I need to pick presets?  Which one?
    - What are the final eye margins?
  - Scenario #1 requirements
    - Emulate hardware training protocol as literally as possible.
    - Models must communicate at simulation runtime to do this.
    - What Cadence proposed to begin with.
    - What are our operating margins after training?
    - Want cross vendor (model maker) support.  Interoperability.
    - Want to report the optimized IP settings (taps, etc.).
      - So we can correlate with lab measurements and verify.
    - Want a constrained optimization based on actual IP capabilities.
      - Want it based on actual tap granularities for example.
    - Some semiconductor vendors have said they want:
      - Want GetWave() based "literal" reproduction of what hardware does.
      - Also want the option of trying to approximate it with Init().
      - Want support for a hardware starting point (hardware presets).
      - Want probes to work correctly.
        - If we probe a Tx output or somewhere in a channel we see the right
          behavior.
  - Scenario #2 requirements
    - Seems almost exactly like #1 from a modeling standpoint.
      - May be the source of a lot of the confusion.
    - Closely related to #1.
      - Why SiSoft has been adamant about one spec. that covers both.
    - Trying to optimize settings for hardware that does not communicate at
      hardware runtime.  (no Back-channel).
    - Simulation standpoint - We want our models to act that way anyway.
      - Means that models are doing something beyond what hardware does.
    - Means we're thinking about things that are search path independent.
    - Assumes optimization is path independent, global optimum.
    - Design question is:
      - If we can program the Tx and Rx, how should we program them to optimize
        the system's performance?
    - We are all familiar with Rx models that do optimization already.
      - DFE, CTLE, VGA, Super loops.
    - This is beyond that.  It's a Rx model that knows how to talk to some Tx.
    - Designer's questions are similar (to #1):
      - Can this link work with this IP?
      - How should I set them up?
      - What are the margins once I'm done?
  - Scenario #1 and #2 almost identical for simulation methodology.
  - Problem to be solved is slightly different:
    - Scenario #1 - Predict what hardware will do when it trains itself.
    - Scenario #2 - Program the hardware upfront and let Rx adjust itself.
- Michael Mirmak - I'm interpreting you to mean:
  - Scenario #1 - Objective is to duplicate the algorithm that is in silicon.
    - Therefore, implication is that we should defer to the model since it
      should be written to represent the actual silicon.
    - Getting the Tx and Rx to communicate is then the issue.
  - Scenario #2 - System simulation person might want to use their own
    optimization technique.  Tool will have to get more involved.
  - Is that a fair summary?
- Todd - The million dollar question for later is, "Who does the optimizing?".
  - We will propose that the receiver does it.
  - Some of my customers have systems with thousands of links in them.
    - Systems where Txs don't talk to Rxs.
    - Even if you put Rx in auto mode, you need to know how to program the Tx.
- Arpad - A variation on what Mike said is:
  - Not only does the user think they can do a better job with the optimization,
    the silicon itself might not do any.
- Todd - [continuing to next slide]
  - Why Scenario #2?
    - Starting point for lab validation.  Figure out how to program Txs.
    - Need settings on a per link basis.
    - Want to optimize them.
      - Currently people often bin channels according to length (loss).
      - Not good enough if you want to get every last bit of margin.
      - Some customers want this per link optimization.
    - Enabling AMI models to communicate and cross optimize in the general
      case, even when the hardware doesn't do it.
    - Does that make sense?
- Arpad - Yes, that is what my comment was stating.
- Todd - Scenario #2 requirements:
  - Similar but a bit different.
  - Want to know optimized settings and margins.
  - Particularly want to know the optimized settings.
    - Need to set up the hardware.  Txs won't set themselves.
    - Want to be able to extract optimized Tx settings.
  - Want cross vendor interoperability.
  - Need a way to prove results are valid.
    - We use this optimization and get some settings for Tx and Rx.
    - Subsequent simulation run with same settings programmed directly should
      give same result.
  - High throughput is important.
    - Optimize 4000 links in an overnight run as a goal.
  - Fully constrain the optimized solution based on actual IP settings.
  - Probes work correctly.
  - We want user selectable optimization criteria.
    - Since we're not duplicating a specific hardware algorithm in this case, we
      could have multiple options.
    - Might want to use different metrics for optimization.
- MM - On that fifth point ["Constrain solution based on IP capabilities"]:
  - I think we need to be extremely cautious if we say that without getting
    into the difference between "capabilities" and "algorithm."
  - It can make scenario #2 sound like it's scenario #1.  
  - You could optimize according to the IP capabilities, but use an entirely
    different optimization algorithm from what the silicon has.
  - You could get a properly constrained optimization, but an optimization done
    with an entirely different algorithm so you don't get the right results.
  - Constrain the answer, but not necessarily guarantee the same answer.
  - Many people don't see the distinction.
- Walter - My 2 cents:
  - Two questions to be answered:
  - 1. Does the model really represent the silicon.  Is it right?
  - 2. If the model is right, if the search algorithm in your software is not
       very good you may not find the same minimum the hardware would find.
- MM - Excellent, yes.
- Walter - The constrained solution is based on the limits of each of the taps.
  - But it's not talking about the algorithm that searches the Tx space.
- MM - As long as that's abundantly clear.
- Walter - Yes, it has to be abundantly clear.
  - Hopefully we will make it clear in this presentation.
- Todd - [moving on] Introduce some terminology:
  - Adaptation - Any behavior in an AMI model that changes on a bit-by-bit
    basis (GetWave()).
  - Eye Quality Metric (EQM) - numeric measure of eye quality.
    - What is optimized.
  - Self Optimization - Adjusting internal behavior to optimize EQM.
    - Mainly with Rx models.
    - Assumes Rx model has some kind of EQM that it computes and uses to adapt.
  - Co-Optimization - Simultaneously adjusting Tx and Rx by any method.
  - Co-Optimization by Proxy.
    - Rx model is doing the Tx optimization in place of the Tx.
    - New concept we'll discuss here.
  - Key questions we try to answer:
    - What's being optimized?
    - By whom?
    - Local or global?
  - [showing flow slides]
  - Basic AMI flow - no optimization, models just running.
  - Self Optimization flow - 
    - Implies additional logical block in the Rx .dll that can monitor the
      waveform, clocks, etc. and compute an EQM and come up with new settings.
    -  A little control loop in the Rx model itself.
    - Shown in flow as two distinct blocks even though it's in one .dll.
  - Co-Optimization by Proxy
    - Special case of Co-Optimization using a matched set.
    - Tx model equalization is disabled.
    - Rx actually provides Tx equalization in place of the Tx.
    - Rx can ultimately report back its optimized settings for the Tx.
    - User can plug them back in to the Tx and run the analysis.
    - Pro - Requires no change in the current AMI flow.
      - Current simulators could do it.
    - Con - Only works for paired models from the same vendor.
      - Rx knows a lot about the details of the Tx.
    - The EQM optimization function in the .dll can adjust the Rx and proxy Tx.
    - All three in the Rx .dll.
- Arpad - How could this proxy Tx be used effectively if its output doesn't go
          through the channel?
- Todd - The channel has come into it.
  - Good point, the assumption is the Tx's equalization is LTI.
  - So it doesn't matter where we apply it.
- Arpad - In the LTI system the order between the Tx and channel doesn't matter.
- Todd - Yes.
- Arpad - Would Tx side of the diagram have to be modified?
  - Tool would have to stimulate the channel directly, not through the Tx.
- Todd - Tx has analog we need to capture (bandwidth limit, reflections, etc.).
  - Tx also has algorithmic behavior.
- Walter - Arpad, note the "no EQ" next to the Tx block.
  - Tx has no equalization.  All Tx equalization is done in the Rx proxy.
- Arpad - Okay, I missed that ["no EQ"].
- John - There could be jitter associated with the Tx.
  - That would throw a monkey wrench into this [LTI assumption].
- Todd - That's true.
  - This is not perfect.
  - I'm just observing that some people have used this with success.
- Arpad - What is the motivation?  Why would someone want to do this?
- Walter - Without a back-channel or link optimization BIRD, it's the only way.
  - Only way to have an Rx optimize the Tx.
- Todd - Alternatively, let's go back to the stock AMI slide.
  - Tx, Rx with auto mode (adaptive CTLE, DFE).
  - Million dollar question - how to I set up my transmitter.
  - If I look at all the combinations of Tx taps at minimum resolution, there
    could be thousands.
  - Probably not practical.
  - Find methods with coarser studies, or go with Tx Rx pair and proxy solution. - Todd - Co-Optimization Analysis Flow.
  - Much of this very similar to what Cadence already observed.
  - In hardware, link training (co-optimization) happens before the system runs.
  - When you're doing training, normal system operation will follow.
  - We want simulation based co-optimization to do the same thing.
  - Today - IBIS 6.0.
    - Network characterization phase (impulse response).
    - Followed by channel simulation.
  - Co-optimization (proposed)
    - Network characterization as we always did it.
    - Co-optimization phase (Init() and GetWave()).
    - Go into channel simulation, but don't rerun Init().
- Todd - [Co-Optimization block diagram slide]
  - Trying to show the functional blocks, regardless of how they'd be packaged.
    - Tx Exploration Algorithm in Rx model.
    - That algorithm has to be able to send a message back to the Tx.
    - TxConfigurator block.
      - Level of indirection.
      - Accepts message from Exploration Algorithm and translates for the Tx.
      - May be a trivial pass through or may need to map them.
      - Walter has discussed tap coefficients vs. increments for example.
    - This stuff all gathered in Tx and Rx .dlls.
  - What are we optimizing?
    - EQM - quantitative metric internal to the Rx model.
      - Need not be defined to the outside world at all.
      - But it does drive the Rx model's optimization.
  - Who's doing the optimizing?
    - In our view, the Rx model.
  - What algorithm is getting used?
    - In scenario #1, following exact protocol the hardware follows.
      - Quite possibly a local optimum.
    - Scenario #2, we would have an algorithm.
      - Expect the optimum to be more global.
  - What training modes do we need to support?
    - Scenario #1, requirement #1.
      - Bit by Bit hardware simulation (GetWave()).
    - Scenario #2, requirement #4. (high throughput)
      - We think it requires Init() based optimization.
  - Diagram Flows
    - Time Domain Link training.
      - Flow we've discussed many times.
      - Tx and Rx GetWave() run in a training mode.
    - Statistical (Init() based) training.
      - Need training mode in Init().
      - But Init() is currently not re-entrant.
        - It does initialization, memory allocation, impulse processing.
      - Our proposal is a new function.
        - AMI_Impulse() - like the GetWave() version of Init().
        - Does not do the initialization, memory allocation, etc.
        - Call Init() once, then call AMI_Impulse() as often as needed.
- Arpad - Couldn't this have been done just by repeatedly calling Init()?
- MM - But Init() allocates memory, right?
- Todd - We could do it with Init() if you come up with a way to tell Init() not
         to do memory allocation every time.
  - If we wanted to change Init() behavior, we could.
  - We are proposing just creating this new function.
- Arpad - Okay, I understand.
- John - You could attack it with an AMI parameter that tells Init() to behave
         differently, but good luck to people learning that spec.
- Todd - We had this debate of new function vs. changing legacy stuff.
- ML - This is actually an easy architecture to implement.
  - Implement an Init() function that does allocation and then calls
    AMI_Impulse() to do everything else.
- MM - There are tools out there that already do this.
- Todd - [moving on]
  - Init() training followed by Get_Wave() training.
    - Each can loop until it's done and then move on.
    - Final step in diagram shows additional step we hypothesized.:
      - Train in Init().
      - Refine in GetWave().
      - Go back and get a final refinement of the impulse response for a
        statistical simulation flow.
  - If we buy into all these flows, how do we handle flow control?
    - Tx and Rx read and write link training data.
    - Propose a state variable the simulator uses to tell the models
      what state they are operating under.
    - Rx model returns a state value indicating:
      - keep training
      - stop training
      - abort
    - Rx model really controls when a phase of training is complete.
  - In terms of .dlls.
    - Init() doesn't change.
    - New AMI_Impulse() function for repeated calls.
    - GetWave() - need BIRD 128 for parameters_InOut.
  - That's enough for today.
  - Does what I said make sense?
- MM - This makes sense.
  - Particular for cases where you really want to simulate everything using
    impulse responses only.
  - There are plenty of cases where you never want to touch GetWave(), and this
    does it.
- Todd - Okay.
  - I just want to know if I made sense to everyone.
- Arpad - Trying to look at this from a distance compared to BIRD 147.
  - Only difference I can see is what you described earlier in the presentation.
    - BIRD 147 addresses simulating exactly what the physical device does.
    - This addresses everything else.
    - Is that a fair summary?
- Todd - They're largely similar.
  - BIRD 147 has had bits and pieces added over time to address some of this.
  - I don't think BIRD 147 is expressed that clearly.
  - Biggest differences in implementation:
    - Re-entrant entry point for Init() based processing.
    - Pull out state variables in a more dedicated way.
- Walter - There will be another difference we haven't gotten to yet.
  - That is: How to describe the messages getting sent back and forth?
- ML - Could go a lot further but need a longer meeting for that.
  - Really important to go back to slides 6 through 8 and talk about
    the requirements first.
- Arpad - This hasn't yet touched on the some of the BIRD 147 debates over
          the .bci file.
- Todd - We originally said we were going to talk about market requirements.
  - Probably up through about slide 10 today.
  - Wanted to give you a broader context about what's coming.
- Radek - One question comes to my mind.
  - Why do you insist on the optimization being done by the Rx.
  - There are general optimization tools available in many EDA platforms.
  - Design optimization is an established process, not just blind sweeps.
  - Don't need to overload the Rx with that responsibility.
  - I think it's a valid approach.
    - Would like to understand why we don't include it in this discussion.
- Todd - Really good point.
  - Scenario #1 - Literally emulating what hardware does.
- Radek - Yes, but Scenario #2 is open for other applications.
- Walter - I think what you're talking about is the Tx Exploration Algorithm.
  - One thing is to put in into the Rx.
  - What you're saying is there might be some other external algorithm.
    - In that case the Rx would have to report an EQM as an output.
- Radek - I don't want to have a prolonged discussion at this moment.
  - I just wanted to mention it for thought.
- Todd - You make a good point.
  - I think optimizers are more prevalent in the traditional microwave space.
  - Not sure if all tools in this space have them built in.
  - If we want to pursue it, then what info will the external optimizer need?
- Radek - Exactly.
- Arpad - Interface to these models may need options for this.
  - Parameters that go to the Tx may not care who did the optimization.
- Todd - Walter has been on this for awhile.
  - Trying to characterize Tx models.
    - Come up with defined data to describe them.
    - We could then allow other algorithms to treat them as black boxes.
  - Walter hasn't gotten traction on that, but we've been on it.
- Arpad - Okay, thank you for the presentation.
  - Thank you everyone for all the good work this year.
  - Happy Holidays and Happy New Year.

-------------
Next meeting: 06 Jan 2015 12:00pm PT
-------------
IBIS Interconnect SPICE Wish List:

1) Simulator directives